Skip to main content

FRAMID Transformation

Qiushan Tao
Qiushan Tao
Co-leader, FHS-BAP Data Core
summary

The health exam datasets from the Framingham Heart Study (FHS) use 'idtype' and 'id' to identify each person uniquely. However, in some shared datasets, 'framid' is often used for the same purpose. When merging data from both sources, the first step is to create the 'framid' as the primary key. Keep in mind, we expect variable names to be in lowercase in the following programs. But, if they're in uppercase, you might need to make extra adjustments, especially if your software is case-sensitive.

note

The study IDs provided in this tutorial are dummy IDs used for demonstration purposes only and do not represent real data.

get_framid.r
#####################################################################
# License : This source code is licensed under the MIT license.
# Author(s) : FHS-BAP data core (QT).
# Release date : TBA
# Description : Get the framid from IDTYPE and ID in FHS Data.
# Usage : library(dplyr)
# : data <- data %>%
# : dplyr::mutate(framid = get_framid(idtype, id)
#####################################################################

get_framid <- function(idtype, id) {
# Check if idtype and id have compatible lengths
if (!((length(idtype) == 1 & length(id) >= 1) |
(length(idtype) == length(id)))) {
stop("Error: The lengths of 'idtype' and 'id' don't match!")
}

# Define the id_matrix
id_matrix <- data.frame(
idtype = c(0, 1, 2, 3, 7, 72),
cohort = c('Gen 1', 'Gen 2', 'NOS', 'Gen 3', 'Omni 1', 'Omni 2'),
adjust_factor = c(0, 80000, 20000, 30000, 70000, 720000)
)

# Merge idtype and id with id_matrix
merged_data <- merge(data.frame(idtype = idtype, id = id),
id_matrix, by = "idtype", all.x = TRUE)

# Calculate the framid by adding id and adjust_factor
return(merged_data$id + merged_data$adjust_factor)
}

The Python function used to retrieve the 'framid'.โ€‹

get_framid.py
#########################################################################
# The Python function used to retrieve the 'framid' from idtype and id.
# NOTE: The study IDs provided in this tutorial are dummy IDs used for
# demonstration purposes only and do not represent real data.
#########################################################################

import pandas as pd

def get_framid(examdata):
# Create id_matrix DataFrame
id_matrix_data = {
'idtype': [0, 1, 2, 3, 7, 72],
'cohort': ['Gen_1', 'Gen_2', 'NOS', 'Gen_3', 'Omni_1', 'Omni_2'],
'adjust_factor': [0, 80000, 20000, 30000, 70000, 720000]
}
id_matrix = pd.DataFrame(id_matrix_data)

# Merge examdata and id_matrix DataFrames
merged_data = pd.merge(examdata, id_matrix, on='idtype')

# Check if idtype and id_exam have compatible lengths
if not ((len(merged_data['idtype']) == 1 and len(merged_data['id']) >= 1) or
(len(merged_data['idtype']) == len(merged_data['id']))):
print("Error: The lengths of 'idtype' and 'id' don't match!")
raise ValueError("Error: The lengths of 'idtype' and 'id' don't match!")

# Calculate framid
merged_data['framid'] = merged_data['id'] + merged_data['adjust_factor']

# Drop unnecessary columns
merged_data.drop(['cohort', 'adjust_factor'], axis=1, inplace=True)

return merged_data

###############################################################
# Example usage:
# (1) Create sample data for demonstration
# (2) Apply the get_framid() function to the sample data.
###############################################################
examdata = pd.DataFrame({
'idtype': [0, 1, 2, 3, 7, 72],
'id': [1, 2, 3, 4, 5, 6]
})

result = get_framid(examdata)
print(result)

The SAS code used to retrieve the 'framid'.โ€‹

get_framid.sas
/*********************************************************************
* The SAS code used to retrieve the 'framid' from idtype and id.
* Create sample data for demonstration.
* NOTE: The study IDs provided in this tutorial are dummy IDs used
* demonstration purposes only and do not represent real data.
**********************************************************************/
data examdata;
input idtype id;
datalines;
0 1
1 2
2 3
3 4
7 5
72 6
;
run;

data id_matrix;
input idtype cohort $ adjust_factor;
datalines;
0 Gen_1 0
1 Gen_2 80000
2 NOS 20000
3 Gen_3 30000
7 Omni_1 70000
72 Omni_2 720000
;
run;

data get_framid;
merge examdata id_matrix;
by idtype;
if not ((n(idtype) = 1 and n(id) >= 1) or (n(idtype) = n(id))) then do;
put "Error: The lengths of 'idtype' and 'id' don't match!";
_ERROR_ = 1;
end;
framid = id + adjust_factor;
drop cohort adjust_factor;
run;